Exploring Class Overlap in Classification Challenges: Introducing the R Package

useR! 2024, Salzburg, Austria

Priyanga Dilini Talagala

July 9, 2024

Data Quality Problems

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

    • Noise: Variability or errors in data collection

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

    • Noise: Variability or errors in data collection

    • Feature Representation: Insufficient or inadequate features to separate classes

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

    • Noise: Variability or errors in data collection

    • Feature Representation: Insufficient or inadequate features to separate classes

  • It makes it challenging for classifiers to accurately distinguish between classes

Implications of Class Overlap

  • Classifiers struggle to correctly classify instances due to overlapping regions

Implications of Class Overlap

  • Classifiers struggle to correctly classify instances due to overlapping regions

  • Higher error rates occur in areas where classes overlap, leading to more instances being misclassified

Implications of Class Overlap

  • Classifiers struggle to correctly classify instances due to overlapping regions

  • Higher error rates occur in areas where classes overlap, leading to more instances being misclassified

  • If the problem of class overlap is not addressed, models may become overly complex, leading to overfitting issues where the model performs well on training data but poorly on unseen data

Types of Class Overlapping Problems